BioLite, a Lightweight Bioinformatics Framework with Automated Tracking of Diagnostics and Provenance

نویسندگان

  • Mark Howison
  • Nicholas A. Sinnott-Armstrong
  • Casey W. Dunn
چکیده

We present a new Python/C++ framework, BioLite, for implementing bioinformatics pipelines for NextGeneration Sequencing (NGS) data. BioLite tracks provenance of analyses, automates the collection and reporting of diagnostics (such as summary statistics and plots at intermediate stages), and profiles computational requirements. These diagnostics can be accessed across multiple stages of a pipeline, from other pipelines, and in HTML reports. Finally, we describe several use cases for diagnostics in our own analyses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mapping the NRC Dataflow Model to the Open Provenance Model

The Open Provenance Model (OPM) has recently been proposed as an exchange framework for workflow provenance information. In this paper we show how the NRC data model for workflow repositories can be mapped to the OPM. Our mapping includes such features as complex data flow in an execution of a workflow; different workflows in the repository that call each other; and the tracking of subvalues of...

متن کامل

Semantic Representation of Provenance in Wikipedia

Wikis are often considered as being a wide source of information. However, identifying provenance information about their content is crucial, whether it is for computing trust in public wiki pages or to identify experts in corporate wikis. In this paper, we address this issue by providing a lightweight ontology for provenance management in wikis, based on the W7 model. Furthermore, we showcase ...

متن کامل

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computationand data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this w...

متن کامل

Wrangling Galaxy’s reference data

UNLABELLED The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cac...

متن کامل

Tracking Provenance in ORNL’s Flexible Research Platforms

Provenance is defined as information about the origin of objects, a concept that applies to both physical and digital objects and often overlaps both. The use of provenance in systems designed for research is an important but forgotten feature. Provenance allows for proper and exact tracking of information, its use, its lineage, its derivations and other metadata that are important for correctl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012